Vocabulary Independent Oov D Vector Mach
نویسنده
چکیده
In this paper, a novel Out-of-Vocabulary (OOV) word detection method relying on phoneme-level acoustic measures and Support Vector Machines (SVM) is proposed. Word level OOV scores are computed from the phoneme level in-vocabulary (IV) and OOV information provided by an HMM based speech recognizer. The OOV word decision is based on the confidence feature vector which is processed by a SVM classifier. The decision thresholds are independent of the used test vocabulary. The performance of the proposed SVM classification scheme was experimentally compared with the word and sub-word level confidence methods. The tests indicate that the SVM based OOV rejection best generalizes the performance on the test set. While all methods were found to provide a similar performance after parameter optimization on the training set, the proposed SVM classification scheme decreased the false acceptance rate on test set by 30.4% compared with the word level confidence method and experimental decision threshold values.
منابع مشابه
Learning units for domain-independent out-of- vocabulary word modelling
This paper describes our recent work on detecting and recognizing out-of-vocabulary (OOV) words for robust speech recognition and understanding. To allow for OOV recognition within a word-based recognizer, the in-vocabulary (IV) word network is augmented with an OOV word model so that OOV words are considered simultaneously with IV words during recognition. We explore several configurations for...
متن کاملSpeaker-independent name dialing with out-of-vocabulary rejection
In this paper we propose a system for speaker-independent name dialing in which a name enrolled by a user can be used by other members in a family or co-workers in an o ce. We use speaker-independent sub-word models during enrollment; the recognized sub-word string is later used during recognition. We also present a mechanism for rejecting out-of-vocabulary (OOV) phrases. The best in-vocabulary...
متن کاملReplacing OOV Words For Dependency Parsing With Distributional Semantics
Lexical information is an important feature in syntactic processing like part-ofspeech (POS) tagging and dependency parsing. However, there is no such information available for out-of-vocabulary (OOV) words, which causes many classification errors. We propose to replace OOV words with in-vocabulary words that are semantically similar according to distributional similar words computed from a lar...
متن کاملTerm-dependent confidence for out-of-vocabulary term detection
Within a spoken term detection (STD) system, the decision maker plays an important role in retrieving reliable detections. Most of the state-of-the-art STD systems make decisions based on a confidence measure that is term-independent, which poses a serious problem for out-of-vocabulary (OOV) term detection. In this paper, we study a term-dependent confidence measure based on confidence normalis...
متن کاملOptimal size, freshness and time-frame for voice search vocabulary
In this paper, we investigate how to optimize the vocabulary for a voice search language model. The metric we optimize over is the out-of-vocabulary (OoV) rate since it is a strong indicator of user experience. In a departure from the usual way of measuring OoV rates, web search logs allow us to compute the per-session OoV rate and thus estimate the percentage of users that experience a given O...
متن کامل